online structured laplace approximation
Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting
We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.
Reviews: Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting
This work propose the application of Kronecker factored online Laplace approximation for overcoming catastrophic forgetting of neural networks. My main criticism of this paper is its lack of novelty/originality. As mentioned in the paper, using online Laplace propagation for continual learning of neural networks has already been explored in elastic weight consolidation (EWC) with its variants. Also, using Kronecker factored approximation of the Hessian has already been studied by Botev et. Still, I think this work provides a useful contribution to the field by building up on the popular framework of applying Laplace projection with state-of-art Hessian approximations and might be worth accepting to the conference.
Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting
Ritter, Hippolyt, Botev, Aleksandar, Barber, David
We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.